13 research outputs found

    Bio-inspired multisensory integration of social signals

    Get PDF
    Emotions understanding represents a core aspect of human communication. Our social behaviours are closely linked to expressing our emotions and understanding others’ emotional and mental states through social signals. Emotions are expressed in a multisensory manner, where humans use social signals from different sensory modalities such as facial expression, vocal changes, or body language. The human brain integrates all relevant information to create a new multisensory percept and derives emotional meaning. There exists a great interest for emotions recognition in various fields such as HCI, gaming, marketing, and assistive technologies. This demand is driving an increase in research on multisensory emotion recognition. The majority of existing work proceeds by extracting meaningful features from each modality and applying fusion techniques either at a feature level or decision level. However, these techniques are ineffective in translating the constant talk and feedback between different modalities. Such constant talk is particularly crucial in continuous emotion recognition, where one modality can predict, enhance and complete the other. This thesis proposes novel architectures for multisensory emotions recognition inspired by multisensory integration in the brain. First, we explore the use of bio-inspired unsupervised learning for unisensory emotion recognition for audio and visual modalities. Then we propose three multisensory integration models, based on different pathways for multisensory integration in the brain; that is, integration by convergence, early cross-modal enhancement, and integration through neural synchrony. The proposed models are designed and implemented using third generation neural networks, Spiking Neural Networks (SNN) with unsupervised learning. The models are evaluated using widely adopted, third-party datasets and compared to state-of-the-art multimodal fusion techniques, such as early, late and deep learning fusion. Evaluation results show that the three proposed models achieve comparable results to state-of-the-art supervised learning techniques. More importantly, this thesis shows models that can translate a constant talk between modalities during the training phase. Each modality can predict, complement and enhance the other using constant feedback. The cross-talk between modalities adds an insight into emotions compared to traditional fusion techniques

    Synch-Graph : multisensory emotion recognition through neural synchrony via graph convolutional networks

    Get PDF
    Human emotions are essentially multisensory, where emotional states are conveyed through multiple modalities such as facial expression, body language, and non-verbal and verbal signals. Therefore having multimodal or multisensory learning is crucial for recognising emotions and interpreting social signals. Existing multisensory emotion recognition approaches focus on extracting features on each modality, while ignoring the importance of constant interaction and co- learning between modalities. In this paper, we present a novel bio-inspired approach based on neural synchrony in audio- visual multisensory integration in the brain, named Synch-Graph. We model multisensory interaction using spiking neural networks (SNN) and explore the use of Graph Convolutional Networks (GCN) to represent and learn neural synchrony patterns. We hypothesise that modelling interactions between modalities will improve the accuracy of emotion recognition. We have evaluated Synch-Graph on two state- of-the-art datasets and achieved an overall accuracy of 98.3% and 96.82%, which are significantly higher than the existing techniques.Postprin

    Investigating multisensory integration in emotion recognition through bio-inspired computational models

    Get PDF
    Emotion understanding represents a core aspect of human communication. Our social behaviours are closely linked to expressing our emotions and understanding others emotional and mental states through social signals. The majority of the existing work proceeds by extracting meaningful features from each modality and applying fusion techniques either at a feature level or decision level. However, these techniques are incapable of translating the constant talk and feedback between different modalities. Such constant talk is particularly important in continuous emotion recognition, where one modality can predict, enhance and complement the other. This paper proposes three multisensory integration models, based on different pathways of multisensory integration in the brain; that is, integration by convergence, early cross-modal enhancement, and integration through neural synchrony. The proposed models are designed and implemented using third-generation neural networks, Spiking Neural Networks (SNN). The models are evaluated using widely adopted, third-party datasets and compared to state-of-the-art multimodal fusion techniques, such as early, late and deep learning fusion. Evaluation results show that the three proposed models have achieved comparable results to the state-of-the-art supervised learning techniques. More importantly, this paper demonstrates plausible ways to translate constant talk between modalities during the training phase, which also brings advantages in generalisation and robustness to noise.PostprintPeer reviewe

    Speech emotion recognition with early visual cross-modal enhancement using spiking neural networks

    Get PDF
    Speech emotion recognition (SER) is an important part of affective computing and signal processing research areas. A number of approaches, especially deep learning techniques, have achieved promising results on SER. However, there are still challenges in translating temporal and dynamic changes in emotions through speech. Spiking Neural Networks (SNN) have demonstrated as a promising approach in machine learning and pattern recognition tasks such as handwriting and facial expression recognition. In this paper, we investigate the use of SNNs for SER tasks and more importantly we propose a new cross-modal enhancement approach. This method is inspired by the auditory information processing in the brain where auditory information is preceded, enhanced and predicted by a visual processing in multisensory audio-visual processing. We have conducted experiments on two datasets to compare our approach with the state-of-the-art SER techniques in both uni-modal and multi-modal aspects. The results have demonstrated that SNNs can be an ideal candidate for modeling temporal relationships in speech features and our cross-modal approach can significantly improve the accuracy of SER.Postprin

    Next-generation capabilities in trusted research environments:interview study

    Get PDF
    BACKGROUND: A Trusted Research Environment (TRE; also known as a Safe Haven) is an environment supported by trained staff and agreed processes (principles and standards), providing access to data for research while protecting patient confidentiality. Accessing sensitive data without compromising the privacy and security of the data is a complex process.OBJECTIVE: This paper presents the security measures, administrative procedures, and technical approaches adopted by TREs.METHODS: We contacted 73 TRE operators, 22 (30%) of whom, in the United Kingdom and internationally, agreed to be interviewed remotely under a nondisclosure agreement and to complete a questionnaire about their TRE.RESULTS: We observed many similar processes and standards that TREs follow to adhere to the Seven Safes principles. The security processes and TRE capabilities for supporting observational studies using classical statistical methods were mature, and the requirements were well understood. However, we identified limitations in the security measures and capabilities of TREs to support "next-generation" requirements such as wide ranges of data types, ability to develop artificial intelligence algorithms and software within the environment, handling of big data, and timely import and export of data.CONCLUSIONS: We found a lack of software or other automation tools to support the community and limited knowledge of how to meet the next-generation requirements from the research community. Disclosure control for exporting artificial intelligence algorithms and software was found to be particularly challenging, and there is a clear need for additional controls to support this capability within TREs.</p

    Wearable assistive technologies for autism : opportunities and challenges

    Get PDF
    Autism is a lifelong developmental condition that affects how people perceive the world and interact with others. Challenges with typical social engagement, common in the autism experience, can have a significant negative impact on the quality of life of individuals and families living with autism. Recent advances in sensing, intelligent, and interactive technologies can enable new forms of assistive and augmentative technologies to support social interactions. However, researchers have not yet demonstrated effectiveness of these technologies in long-term real-world use. This paper presents an overview of social and sensory challenges of autism, which offer great opportunities and challenges for the design and development of assistive technologies. We review the existing work on developing wearable technologies for autism particularly to assist social interactions, analyse their potential and limitations, and discuss future research directions.PostprintPeer reviewe

    GRAIMATTER Green Paper:Recommendations for disclosure control of trained Machine Learning (ML) models from Trusted Research Environments (TREs)

    Get PDF
    TREs are widely, and increasingly used to support statistical analysis of sensitive data across a range of sectors (e.g., health, police, tax and education) as they enable secure and transparent research whilst protecting data confidentiality.There is an increasing desire from academia and industry to train AI models in TREs. The field of AI is developing quickly with applications including spotting human errors, streamlining processes, task automation and decision support. These complex AI models require more information to describe and reproduce, increasing the possibility that sensitive personal data can be inferred from such descriptions. TREs do not have mature processes and controls against these risks. This is a complex topic, and it is unreasonable to expect all TREs to be aware of all risks or that TRE researchers have addressed these risks in AI-specific training.GRAIMATTER has developed a draft set of usable recommendations for TREs to guard against the additional risks when disclosing trained AI models from TREs. The development of these recommendations has been funded by the GRAIMATTER UKRI DARE UK sprint research project. This version of our recommendations was published at the end of the project in September 2022. During the course of the project, we have identified many areas for future investigations to expand and test these recommendations in practice. Therefore, we expect that this document will evolve over time. The GRAIMATTER DARE UK sprint project has also developed a minimal viable product (MVP) as a suite of attack simulations that can be applied by TREs and can be accessed here (https://github.com/AI-SDC/AI-SDC).If you would like to provide feedback or would like to learn more, please contact Smarti Reel ([email protected]) and Emily Jefferson ([email protected]).The summary of our recommendations for a general public audience can be found at DOI: 10.5281/zenodo.708951

    Scottish Medical Imaging Service:Technical and Governance controls

    Get PDF
    Objectives The Scottish Medical Imaging (SMI) service provides linkable, population based, “research-ready” real-world medical images for researchers to develop or validate AI algorithms within the Scottish National Safe Haven. The PICTURES research programme is developing novel methods to enhance the SMI service offering through research in cybersecurity and software/data/infrastructure engineering. Approach Additional technical and governance controls were required to enable safe access to medical images. The researcher is isolated from the rest of the trusted research environment (TRE) using a Project Private Zone (PPZ). This enables researchers to build and install their own software stack, and protects the TRE from malicious code. Guidelines are under development for researchers on the safe development of algorithms and the expected relationship between the size of the model and the training dataset. There is associated work on the statistical disclosure control of models to enable safe release of trained models from the TRE. Results A policy enabling the use of “Non-standard software” based on prior research, domain knowledge and experience gained from two contrasting research studies was developed.  Additional clauses have been added to the legal control – the eDRIS User Agreement – signed by each researcher and their Head of Department.  Penalties for attempting to import or use malware, remove data within models or any attempt to deceive or circumvent such controls are severe, and apply to both the individual and their institution. The process of building and deploying a PPZ has been developed allowing researchers to install their own software. No attempt has yet been made to add additional ethical controls; however, a future service development could be validating the performance of researchers’ algorithms on our training dataset. Conclusion The availability to conduct research using images poses new challenges and risks for those commissioning and operating TREs. The Private Project Zone and our associated governance controls are a huge step towards supporting the needs of researchers in the 21st century
    corecore